Towards Knowledge Discovery from the Vatican Secret Archives. In Codice Ratio – Episode 1: Machine Transcription of the Manuscripts

نویسندگان

  • Donatella Firmani
  • Marco Maiorino
  • Paolo Merialdo
  • Elena Nieddu
چکیده

In Codice Ratio is a research project to study tools and techniques for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, we present our e‚orts to develop a system to support the transcription of medieval manuscripts. Œe goal is to provide paleographers with a tool to reduce their e‚orts in transcribing large volumes, as those stored in the VSA, producing good transcriptions for signi€cant portions of the manuscripts. We propose an original approach based on character segmentation. Our solution is able to deal with the dirty segmentation that inevitably occurs in handwriŠen documents. We use a convolutional neural network to recognize characters and language models to compose word transcriptions. Our approach requires minimal training e‚orts, making the transcription process more scalable as the production of training sets requires a few pages and can be easily crowdsourced. We have conducted experiments on manuscripts from the Vatican Registers, an unreleased corpus containing the correspondence of the popes. With training data produced by 120 high school students, our system has been able to produce good transcriptions that can be used by paleographers as a solid basis to speedup the transcription process at a large scale.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Codice Ratio: Scalable Transcription of Historical Handwritten Documents

Huge amounts of handwritten historical documents are being published by digital libraries world wide. However, for these raw digital images to be really useful, they need to be annotated with informative content. State-of-the-art Handwritten Text Recognition (HTR) approaches require an impressive training effort by expert paleographers. Our contribution is a scalable, end-to-end transcription w...

متن کامل

Designing an Ontology for Knowledge Discovery in Iran’s Vaccine

Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...

متن کامل

The role and contribution of Kashan city in the field of Islamic manuscripts of Iran and Iraq

Purpose: Manuscripts, which are considered to be the cultural, scientific and artistic heritage of nations, are important from three aspects of cultural, scientific and artistic, and are considered to be signs of cultural power and scientific development of every country and region. In Iran, most of the manuscripts are in a few cities with a rich civilization history, science and culture, one o...

متن کامل

Drug Discovery Acceleration Using Digital Microfluidic Biochip Architecture and Computer-aided-design Flow

A Digital Microfluidic Biochip (DMFB) offers a promising platform for medical diagnostics, DNA sequencing, Polymerase Chain Reaction (PCR), and drug discovery and development. Conventional Drug discovery procedures require timely and costly manned experiments with a high degree of human errors with no guarantee of success. On the other hand, DMFB can be a great solution for miniaturization, int...

متن کامل

Add-on for High Throughput Screening in Material Discovery for Organic Electronics: “Tagging” Molecules to Address the Device Considerations

This work reflects the worth of intelligent modeling in controlling the nanostructure morphology in manufacturing organic bulk heterojunction (BHJ) solar cells. It suggests the idea of screening the pool of material design possibilities inspired by machine learning. To fulfill this goal, a set of experimental data on a BHJ solar cell with a donor structure of diketopyrrolopyrrole (DDP) and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018